library(topGO)
library(knitr) # to use kable()
library(limma) # to use vennDiagram()
library(ggplot2)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
isoforms = '../2019-07-26/transcripts/annotation.txt')
TAGDIR <- paste0('../2020-01-08/', TAG, '/')
Introduction
This is the enrichment analysis for isoforms regulated by hatching. Because I quantified the association of expression with hatching in two different ways, this document has two sections. Section Variance reports the enrichment analysis performed with isoforms ordered by the proportion of expression variance explained by hatching. Note that some isoforms with low variance may still be highly associated with hatching, even if the fold change between levels of this factor is low. Section Differential expression uses an ordering of isoforms based on the significance of the differential expression between levels of hatching, which does depend on fold change.
Reading the data
Functional annotation is in 2019-07-26. I will also upload two lists of isoforms, with either proportion of variance explained by hatching or p-value of differential expression test.
tagVariance <- read.table(paste0(TAGDIR, VAR, '_variance.txt'))
tagPValue <- read.table(paste0(TAGDIR, VAR, '_pvalue.txt'))
annotation <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))
To initialize the topGOdata object, I will need the gene list as a named numeric vector, where the names are the isoforms identifiers and the numeric values, either the portion of variance explained by hatching or the p-values of the differential expression test. The structure() function adds attributes to an object.
Variance <- structure(tagVariance[,1], names = row.names(tagVariance))
PValues <- structure(tagPValue[,1], names = row.names(tagPValue))
rm(tagVariance, tagPValue)
There are 33037 isoforms scored with a variance portion and a differential expression p-value. It should actually be the exact same isoforms. The annotation data frame has more than one GO term for every tag, separated by |. I need a named list.
head(annotation)
## tagname
## 1 TCONS_00000002
## 2 TCONS_00000010
## 3 TCONS_00000011
## 4 TCONS_00000016
## 5 TCONS_00000017
## 6 TCONS_00000018
## goterms
## 1 GO:0008417|GO:0016020|GO:0006486
## 2 GO:0043130|GO:0005515|GO:0043161
## 3 GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 6 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
allgenes2GO <- strsplit(as.character(annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- annotation$tagname
rm(annotation)
There are 19735 isoforms with GO annotations. But the differential expression analysis includes many more isoforms. In order to include the not-annotated isoforms in the enrichment analysis, to see if annotated or not annotated isoforms are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.
for (tag in unique(c(names(PValues), names(Variance)))) {
if (! tag %in% names(allgenes2GO)) {
allgenes2GO <- append(allgenes2GO,
structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
}
}
Using differential expression p-values
Building the topGO object
Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.
I generate three datasets, to analyse each of the three ontologies.
selection <- function(allScores) {return(allScores < 0.05)}
GOdata.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdata.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = PValues,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
| BP |
26235 |
1582 |
| MF |
30580 |
791 |
| CC |
21616 |
390 |
Running the tests
There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.
BP.elim <- runTest(GOdata.BP, algorithm = "elim", statistic = "ks")
BP.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.lea <- runTest(GOdata.BP, algorithm = "lea", statistic = "ks")
MF.elim <- runTest(GOdata.MF, algorithm = "elim", statistic = "ks")
MF.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.lea <- runTest(GOdata.MF, algorithm = "lea", statistic = "ks")
CC.elim <- runTest(GOdata.CC, algorithm = "elim", statistic = "ks")
CC.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.lea <- runTest(GOdata.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) length(score(X))),
Significant = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
| BP |
elim |
1582 |
22 |
| BP |
weight01 |
1582 |
10 |
| BP |
lea |
1582 |
34 |
| MF |
elim |
791 |
10 |
| MF |
weight01 |
791 |
13 |
| MF |
lea |
791 |
13 |
| CC |
elim |
390 |
2 |
| CC |
weight01 |
390 |
1 |
| CC |
lea |
390 |
3 |
rm(ResultsSummary)
Results
The topGO package does not pay much attention to what terms are significant because they are overrepresented and which ones are underrepresented among the most differentially expressed genes. I think it’s worth separating them, to facilitate the biological interpretation. Note that not all terms listed in the tables below are significant. The scores for the three methods (elim, weight01 and lea) are non-corrected p-values.
Biological process
orderedTerms <- names(sort(score(BP.weight01)))
significant.weight01 <- score(BP.weight01)[orderedTerms] <= 0.01
significant.lea <- score(BP.lea)[orderedTerms] <= 0.01
significant.elim <- score(BP.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.pvalue.sigTerms <- sigTerms
BP.all <- GenTable(GOdata.BP, elim=BP.elim, weight01=BP.weight01, lea=BP.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
BP.all[BP.all$Significant > BP.all$Expected,],
caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
| 1 |
GO:0001522 |
pseudouridine synthesis |
31 |
4 |
1.56 |
2 |
0.0014 |
0.0014 |
0.00139 |
| 2 |
GO:0046854 |
phosphatidylinositol phosphorylation |
45 |
4 |
2.26 |
4 |
0.0027 |
0.0027 |
0.00274 |
| 3 |
GO:0006400 |
tRNA modification |
58 |
7 |
2.91 |
48 |
0.0157 |
0.0035 |
0.01574 |
| 4 |
GO:0050684 |
regulation of mRNA processing |
15 |
3 |
0.75 |
13 |
0.0060 |
0.0041 |
0.00599 |
| 5 |
GO:0030513 |
positive regulation of BMP signaling pat… |
9 |
4 |
0.45 |
9 |
0.0053 |
0.0053 |
0.00527 |
| 6 |
GO:0016579 |
protein deubiquitination |
90 |
9 |
4.52 |
11 |
0.0058 |
0.0058 |
0.00577 |
| 7 |
GO:0060271 |
cilium assembly |
73 |
4 |
3.67 |
8 |
0.0048 |
0.0060 |
0.00037 |
| 8 |
GO:0070286 |
axonemal dynein complex assembly |
9 |
1 |
0.45 |
18 |
0.0078 |
0.0078 |
0.00778 |
| 9 |
GO:0048015 |
phosphatidylinositol-mediated signaling |
40 |
4 |
2.01 |
22 |
0.0087 |
0.0087 |
0.00873 |
| 10 |
GO:0055072 |
iron ion homeostasis |
21 |
3 |
1.06 |
575 |
0.3722 |
0.0095 |
0.37225 |
| 11 |
GO:0007411 |
axon guidance |
11 |
3 |
0.55 |
30 |
0.0119 |
0.0119 |
0.01186 |
| 13 |
GO:0015937 |
coenzyme A biosynthetic process |
8 |
1 |
0.40 |
41 |
0.0130 |
0.0130 |
0.01296 |
| 14 |
GO:0046434 |
organophosphate catabolic process |
52 |
3 |
2.61 |
924 |
0.6194 |
0.0155 |
0.61935 |
| 15 |
GO:0060828 |
regulation of canonical Wnt signaling pa… |
5 |
2 |
0.25 |
45 |
0.0157 |
0.0157 |
0.01569 |
| 18 |
GO:0048017 |
inositol lipid-mediated signaling |
50 |
5 |
2.51 |
51 |
0.0178 |
0.0178 |
0.00266 |
| 19 |
GO:0006654 |
phosphatidic acid biosynthetic process |
10 |
1 |
0.50 |
52 |
0.0178 |
0.0178 |
0.01785 |
| 20 |
GO:0007275 |
multicellular organism development |
85 |
7 |
4.27 |
136 |
0.0616 |
0.0185 |
0.00270 |
| 21 |
GO:0006511 |
ubiquitin-dependent protein catabolic pr… |
204 |
12 |
10.25 |
375 |
0.2410 |
0.0187 |
0.24104 |
| 22 |
GO:0007040 |
lysosome organization |
8 |
2 |
0.40 |
55 |
0.0195 |
0.0195 |
0.01949 |
kable(
BP.all[BP.all$Significant < BP.all$Expected,],
caption = "Most under-represented terms of the Biological Process ontology.")
Most under-represented terms of the Biological Process ontology.
| 12 |
GO:0006418 |
tRNA aminoacylation for protein translat… |
87 |
2 |
4.37 |
84 |
0.0368 |
0.0122 |
0.03681 |
| 16 |
GO:0050953 |
sensory perception of light stimulus |
11 |
0 |
0.55 |
852 |
0.5606 |
0.0165 |
0.56064 |
| 17 |
GO:0006030 |
chitin metabolic process |
103 |
5 |
5.17 |
128 |
0.0581 |
0.0175 |
0.05810 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))

kable(data.frame(Term = Term(GOTERM[sigTerms]),
Definition = Definition(GOTERM[sigTerms]),
PValue=score(BP.weight01)[sigTerms]),
caption = paste('Biological process terms significantly associated with',
VAR, 'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with hatching according to all 3 algorithms
| GO:0001522 |
pseudouridine synthesis |
The intramolecular conversion of uridine to pseudouridine within an RNA molecule. This posttranscriptional base modification occurs in tRNA, rRNA, and snRNAs. |
0.0013926 |
| GO:0046854 |
phosphatidylinositol phosphorylation |
The process of introducing one or more phosphate groups into a phosphatidylinositol, any glycerophosphoinositol having one phosphatidyl group esterified to one of the hydroxy groups of inositol. |
0.0027365 |
| GO:0050684 |
regulation of mRNA processing |
Any process that modulates the frequency, rate or extent of mRNA processing, those processes involved in the conversion of a primary mRNA transcript into a mature mRNA prior to its translation into polypeptide. |
0.0040779 |
| GO:0030513 |
positive regulation of BMP signaling pathway |
Any process that activates or increases the frequency, rate or extent of BMP signaling pathway activity. |
0.0052673 |
| GO:0016579 |
protein deubiquitination |
The removal of one or more ubiquitin groups from a protein. |
0.0057656 |
| GO:0060271 |
cilium assembly |
The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. |
0.0059768 |
| GO:0070286 |
axonemal dynein complex assembly |
The aggregation, arrangement and bonding together of a set of components to form an axonemal dynein complex, a dynein complex found in eukaryotic cilia and flagella, in which the motor domain heads interact with adjacent microtubules to generate a sliding force which is converted to a bending motion. |
0.0077824 |
| GO:0048015 |
phosphatidylinositol-mediated signaling |
A series of molecular signals in which a cell uses a phosphatidylinositol-mediated signaling to convert a signal into a response. Phosphatidylinositols include phosphatidylinositol (PtdIns) and its phosphorylated derivatives. |
0.0087267 |
I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.
showSigOfNodes(GOdata.BP, score(BP.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 266
## Number of Edges = 512
##
## $complete.dag
## [1] "A graph with 266 nodes."
This is just a example of the most significant GO term:
showGroupDensity(GOdata.BP, names(score(BP.weight01))[which.min(score(BP.weight01))])

Molecular function
orderedTerms <- names(sort(score(MF.weight01)))
significant.weight01 <- score(MF.weight01)[orderedTerms] <= 0.01
significant.lea <- score(MF.lea)[orderedTerms] <= 0.01
significant.elim <- score(MF.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.pvalue.sigTerms <- sigTerms
MF.all <- GenTable(GOdata.MF, elim=MF.elim, weight01=MF.weight01, lea=MF.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
MF.all[MF.all$Significant > MF.all$Expected,],
caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
| 1 |
GO:0016747 |
transferase activity, transferring acyl … |
158 |
9 |
7.81 |
142 |
0.17607 |
0.00023 |
0.17607 |
| 3 |
GO:0005509 |
calcium ion binding |
638 |
34 |
31.52 |
3 |
0.00086 |
0.00086 |
0.00086 |
| 4 |
GO:0035091 |
phosphatidylinositol binding |
109 |
9 |
5.39 |
1 |
0.00043 |
0.00091 |
0.00043 |
| 7 |
GO:0009982 |
pseudouridine synthase activity |
27 |
2 |
1.33 |
5 |
0.00277 |
0.00277 |
0.00277 |
| 8 |
GO:0060090 |
molecular adaptor activity |
45 |
6 |
2.22 |
17 |
0.01348 |
0.00450 |
0.01348 |
| 9 |
GO:0005515 |
protein binding |
4270 |
227 |
210.99 |
27 |
0.02374 |
0.00452 |
0.01001 |
| 10 |
GO:0016788 |
hydrolase activity, acting on ester bond… |
539 |
30 |
26.63 |
52 |
0.06374 |
0.00478 |
0.03279 |
kable(
MF.all[MF.all$Significant < MF.all$Expected,],
caption = "Most under-represented terms of the Molecular Function ontology.")
Most under-represented terms of the Molecular Function ontology.
| 2 |
GO:0140101 |
catalytic activity, acting on a tRNA |
139 |
4 |
6.87 |
2 |
0.00065 |
0.00035 |
0.00065 |
| 5 |
GO:0004812 |
aminoacyl-tRNA ligase activity |
90 |
2 |
4.45 |
22 |
0.01794 |
0.00098 |
0.01794 |
| 6 |
GO:0004594 |
pantothenate kinase activity |
5 |
0 |
0.25 |
4 |
0.00254 |
0.00254 |
0.00254 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MF.weight01)[sigTerms]),
caption = paste('Molecular function terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
Molecular function terms significantly associated with hatching according to all 3 algorithms
| GO:0140101 |
catalytic activity, acting on a tRNA |
NA |
0.0003490 |
| GO:0005509 |
calcium ion binding |
Interacting selectively and non-covalently with calcium ions (Ca2+). |
0.0008562 |
| GO:0035091 |
phosphatidylinositol binding |
Interacting selectively and non-covalently with any inositol-containing glycerophospholipid, i.e. phosphatidylinositol (PtdIns) and its phosphorylated derivatives. |
0.0009095 |
| GO:0004594 |
pantothenate kinase activity |
Catalysis of the reaction: ATP + pantothenate = ADP + D-4’-phosphopantothenate. |
0.0025374 |
| GO:0009982 |
pseudouridine synthase activity |
Catalysis of the reaction: RNA uridine = RNA pseudouridine. Conversion of uridine in an RNA molecule to pseudouridine by rotation of the C1’-N-1 glycosidic bond of uridine in RNA to a C1’-C5. |
0.0027714 |
| GO:0008417 |
fucosyltransferase activity |
Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. |
0.0048171 |
| GO:0004630 |
phospholipase D activity |
Catalysis of the reaction: a phosphatidylcholine + H2O = choline + a phosphatidate. |
0.0055804 |
showSigOfNodes(GOdata.MF, score(MF.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 38
## Number of Edges = 41
##
## $complete.dag
## [1] "A graph with 38 nodes."
showGroupDensity(GOdata.MF, names(score(MF.weight01))[which.min(score(MF.weight01))])

Cellular component
orderedTerms <- names(sort(score(CC.weight01)))
significant.weight01 <- score(CC.weight01)[orderedTerms] <= 0.01
significant.lea <- score(CC.lea)[orderedTerms] <= 0.01
significant.elim <- score(CC.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.pvalue.sigTerms <- sigTerms
CC.all <- GenTable(GOdata.CC, elim=CC.elim, weight01=CC.weight01, lea=CC.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
CC.all[CC.all$Significant > CC.all$Expected,],
caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
| GO:1902555 |
endoribonuclease complex |
10 |
1 |
0.48 |
10 |
0.020 |
0.0041 |
0.020 |
kable(
CC.all[CC.all$Significant < CC.all$Expected,],
caption = "Most under-represented terms of the Cellular Component ontology.")
Most under-represented terms of the Cellular Component ontology.
| 2 |
GO:0045211 |
postsynaptic membrane |
28 |
1 |
1.35 |
6 |
0.015 |
0.0150 |
0.015 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CC.weight01)[sigTerms]),
caption = paste('Cellular component terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with hatching according to all 3 algorithms
showSigOfNodes(GOdata.CC, score(CC.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 18
## Number of Edges = 28
##
## $complete.dag
## [1] "A graph with 18 nodes."
showGroupDensity(GOdata.CC, names(score(CC.weight01))[which.min(score(CC.weight01))])

Using the portion of variance explained by hatching
Building the topGO object
I need to generate the topGO objects again, using the alternative gene ordering, based on the proportion of expression-level variance explained by hatching. I miss a way to inform the topGOdata object that the score in this case is reversed, with respect to p-values: the higher it is, the more differentially expressed the gene is. To make sure that GO terms are tested in the same way than when using p-values, I will just reverse the proportion of variance explained by hatching to its complement. Taking this into account, the subset of interesting genes (selection() function) must be defined as the lowest 10% scores, which are the 10% genes with largest portion of variance explained by hatching.
selection <- function(allScores) {return(allScores <= quantile(allScores, probs = 0.10))}
GOdataVar.BP <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'BP',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdataVar.MF <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'MF',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
GOdataVar.CC <- new('topGOdata',
description = 'Differential expression among Brachionus plicatilis populations.',
ontology = 'CC',
allGenes = 1.0 - Variance,
annot = annFUN.gene2GO,
gene2GO = allgenes2GO,
geneSelectionFun = selection,
nodeSize = 5)
library(knitr)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
Num_Genes = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), numGenes),
Num_GO_terms = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
| BP |
26235 |
1582 |
| MF |
30580 |
791 |
| CC |
21616 |
390 |
Running the tests
BPvar.elim <- runTest(GOdataVar.BP, algorithm = "elim", statistic = "ks")
BPvar.weight01 <- runTest(GOdataVar.BP, algorithm = "weight01", statistic = "ks")
BPvar.lea <- runTest(GOdataVar.BP, algorithm = "lea", statistic = "ks")
MFvar.elim <- runTest(GOdataVar.MF, algorithm = "elim", statistic = "ks")
MFvar.weight01 <- runTest(GOdataVar.MF, algorithm = "weight01", statistic = "ks")
MFvar.lea <- runTest(GOdataVar.MF, algorithm = "lea", statistic = "ks")
CCvar.elim <- runTest(GOdataVar.CC, algorithm = "elim", statistic = "ks")
CCvar.weight01 <- runTest(GOdataVar.CC, algorithm = "weight01", statistic = "ks")
CCvar.lea <- runTest(GOdataVar.CC, algorithm = "lea", statistic = "ks")
ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
algorithm = rep(c("elim", "weight01", "lea"), 3),
TermsTested = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) length(score(X))),
Significant = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) sum(score(X) < 0.01)))
kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
| BP |
elim |
1582 |
27 |
| BP |
weight01 |
1582 |
16 |
| BP |
lea |
1582 |
47 |
| MF |
elim |
791 |
19 |
| MF |
weight01 |
791 |
15 |
| MF |
lea |
791 |
22 |
| CC |
elim |
390 |
11 |
| CC |
weight01 |
390 |
8 |
| CC |
lea |
390 |
17 |
rm(ResultsSummary)
Results
Biological process
orderedTerms <- names(sort(score(BPvar.weight01)))
significant.weight01 <- score(BPvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(BPvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(BPvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.variance.sigTerms <- sigTerms
BPvar.all <- GenTable(GOdataVar.BP, elim=BPvar.elim, weight01=BPvar.weight01, lea=BPvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
BPvar.all[BPvar.all$Significant > BPvar.all$Expected,],
caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
| 1 |
GO:0034474 |
U2 snRNA 3’-end processing |
17 |
6 |
1.70 |
1 |
5e-05 |
5e-05 |
5e-05 |
| 2 |
GO:0044782 |
cilium organization |
99 |
10 |
9.89 |
215 |
0.0742 |
0.00025 |
0.0742 |
| 3 |
GO:0001522 |
pseudouridine synthesis |
31 |
9 |
3.10 |
2 |
0.0010 |
0.00102 |
0.0010 |
| 4 |
GO:0046854 |
phosphatidylinositol phosphorylation |
45 |
14 |
4.50 |
3 |
0.0011 |
0.00107 |
0.0011 |
| 5 |
GO:0048015 |
phosphatidylinositol-mediated signaling |
40 |
13 |
4.00 |
5 |
0.0013 |
0.00132 |
0.0013 |
| 6 |
GO:0043547 |
positive regulation of GTPase activity |
27 |
8 |
2.70 |
6 |
0.0016 |
0.00159 |
0.0016 |
| 7 |
GO:0032324 |
molybdopterin cofactor biosynthetic proc… |
5 |
3 |
0.50 |
7 |
0.0016 |
0.00162 |
0.0016 |
| 8 |
GO:0006400 |
tRNA modification |
58 |
10 |
5.80 |
289 |
0.1084 |
0.00246 |
0.2870 |
| 9 |
GO:0008608 |
attachment of spindle microtubules to ki… |
6 |
3 |
0.60 |
13 |
0.0043 |
0.00426 |
0.0043 |
| 10 |
GO:0043063 |
intercellular bridge organization |
6 |
3 |
0.60 |
14 |
0.0043 |
0.00426 |
0.0043 |
| 11 |
GO:0006336 |
DNA replication-independent nucleosome a… |
13 |
3 |
1.30 |
15 |
0.0045 |
0.00451 |
0.0045 |
| 12 |
GO:0007264 |
small GTPase mediated signal transductio… |
228 |
32 |
22.79 |
26 |
0.0085 |
0.00588 |
0.0085 |
| 13 |
GO:0035307 |
positive regulation of protein dephospho… |
5 |
2 |
0.50 |
18 |
0.0061 |
0.00614 |
0.0061 |
| 14 |
GO:0050953 |
sensory perception of light stimulus |
11 |
2 |
1.10 |
971 |
0.5934 |
0.00712 |
0.5934 |
| 16 |
GO:0010256 |
endomembrane system organization |
61 |
10 |
6.10 |
287 |
0.1076 |
0.00803 |
0.2482 |
| 17 |
GO:0071900 |
regulation of protein serine/threonine k… |
20 |
5 |
2.00 |
689 |
0.3791 |
0.01059 |
0.3791 |
| 18 |
GO:0000183 |
chromatin silencing at rDNA |
5 |
1 |
0.50 |
31 |
0.0121 |
0.01209 |
0.0121 |
| 19 |
GO:0006997 |
nucleus organization |
5 |
2 |
0.50 |
35 |
0.0128 |
0.01285 |
0.0128 |
| 20 |
GO:0042147 |
retrograde transport, endosome to Golgi |
26 |
5 |
2.60 |
41 |
0.0148 |
0.01477 |
0.0148 |
| 21 |
GO:0015937 |
coenzyme A biosynthetic process |
8 |
2 |
0.80 |
42 |
0.0148 |
0.01478 |
0.0148 |
| 22 |
GO:0022008 |
neurogenesis |
39 |
9 |
3.90 |
12 |
0.0036 |
0.01516 |
0.0036 |
| 23 |
GO:0032367 |
intracellular cholesterol transport |
5 |
3 |
0.50 |
53 |
0.0176 |
0.01764 |
0.0176 |
| 24 |
GO:0098535 |
de novo centriole assembly involved in m… |
5 |
1 |
0.50 |
56 |
0.0182 |
0.01817 |
0.0182 |
| 25 |
GO:0016579 |
protein deubiquitination |
90 |
12 |
8.99 |
61 |
0.0187 |
0.01866 |
0.0187 |
| 26 |
GO:0006777 |
Mo-molybdopterin cofactor biosynthetic p… |
8 |
2 |
0.80 |
64 |
0.0201 |
0.02010 |
0.0201 |
| 27 |
GO:0006310 |
DNA recombination |
89 |
12 |
8.89 |
232 |
0.0839 |
0.02271 |
0.0839 |
kable(
BPvar.all[BPvar.all$Significant < BPvar.all$Expected,],
caption = "Most under-represented terms of the Biological Process ontology.")
Most under-represented terms of the Biological Process ontology.
| 15 |
GO:0016573 |
histone acetylation |
31 |
3 |
3.1 |
21 |
0.0074 |
0.00736 |
0.0074 |
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(BPvar.weight01)[sigTerms]),
caption = paste('Biological process terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with hatching according to all 3 algorithms
| GO:0034474 |
U2 snRNA 3’-end processing |
Any process involved in forming the mature 3’ end of a U2 snRNA molecule. |
0.0000497 |
| GO:0001522 |
pseudouridine synthesis |
The intramolecular conversion of uridine to pseudouridine within an RNA molecule. This posttranscriptional base modification occurs in tRNA, rRNA, and snRNAs. |
0.0010244 |
| GO:0046854 |
phosphatidylinositol phosphorylation |
The process of introducing one or more phosphate groups into a phosphatidylinositol, any glycerophosphoinositol having one phosphatidyl group esterified to one of the hydroxy groups of inositol. |
0.0010707 |
| GO:0048015 |
phosphatidylinositol-mediated signaling |
A series of molecular signals in which a cell uses a phosphatidylinositol-mediated signaling to convert a signal into a response. Phosphatidylinositols include phosphatidylinositol (PtdIns) and its phosphorylated derivatives. |
0.0013211 |
| GO:0043547 |
positive regulation of GTPase activity |
Any process that activates or increases the activity of a GTPase. |
0.0015858 |
| GO:0032324 |
molybdopterin cofactor biosynthetic process |
The chemical reactions and pathways resulting in the formation of the molybdopterin cofactor (Moco), essential for the catalytic activity of some enzymes, e.g. sulfite oxidase, xanthine dehydrogenase, and aldehyde oxidase. The cofactor consists of a mononuclear molybdenum (Mo-molybdopterin) or tungsten ion (W-molybdopterin) coordinated by one or two molybdopterin ligands. |
0.0016215 |
| GO:0008608 |
attachment of spindle microtubules to kinetochore |
The process in which spindle microtubules become physically associated with the proteins making up the kinetochore complex. |
0.0042561 |
| GO:0043063 |
intercellular bridge organization |
A process that is carried out at the cellular level which results in the assembly, arrangement of constituent parts, or disassembly of the intracellular bridge. An intracellular bridge is a direct link between the cytoplasms of sister cells that allows cells to communicate with one another. |
0.0042561 |
| GO:0006336 |
DNA replication-independent nucleosome assembly |
The formation of nucleosomes outside the context of DNA replication. |
0.0045060 |
| GO:0007264 |
small GTPase mediated signal transduction |
Any series of molecular signals in which a small monomeric GTPase relays one or more of the signals. |
0.0058771 |
| GO:0035307 |
positive regulation of protein dephosphorylation |
Any process that activates or increases the frequency, rate or extent of removal of phosphate groups from a protein. |
0.0061426 |
| GO:0016573 |
histone acetylation |
The modification of a histone by the addition of an acetyl group. |
0.0073586 |
showSigOfNodes(GOdataVar.BP, score(BPvar.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 289
## Number of Edges = 579
##
## $complete.dag
## [1] "A graph with 289 nodes."
Below I plot variance portion, but for the termo found most significant when using p-values, for comparison.
showGroupDensity(GOdataVar.BP, names(score(BP.weight01))[which.min(score(BP.weight01))])

Molecular function
orderedTerms <- names(sort(score(MFvar.weight01)))
significant.weight01 <- score(MFvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(MFvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(MFvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.variance.sigTerms <- sigTerms
MFvar.all <- GenTable(GOdataVar.MF, elim=MFvar.elim, weight01=MFvar.weight01, lea=MFvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
MFvar.all[MFvar.all$Significant > MFvar.all$Expected,],
caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
| GO:0005515 |
protein binding |
4270 |
463 |
421.41 |
2 |
0.00050 |
0.00011 |
4.2e-05 |
| GO:0060090 |
molecular adaptor activity |
45 |
16 |
4.44 |
1 |
0.00014 |
0.00014 |
1.2e-05 |
| GO:0000166 |
nucleotide binding |
2277 |
277 |
224.72 |
68 |
0.05260 |
0.00019 |
0.01496 |
| GO:0016417 |
S-acyltransferase activity |
6 |
5 |
0.59 |
3 |
0.00079 |
0.00079 |
0.00079 |
| GO:0009982 |
pseudouridine synthase activity |
27 |
7 |
2.66 |
4 |
0.00116 |
0.00116 |
0.00116 |
| GO:0005524 |
ATP binding |
1688 |
215 |
166.59 |
6 |
0.00260 |
0.00260 |
0.00260 |
| GO:0003887 |
DNA-directed DNA polymerase activity |
43 |
11 |
4.24 |
7 |
0.00310 |
0.00310 |
0.00310 |
| GO:0004386 |
helicase activity |
94 |
18 |
9.28 |
16 |
0.00846 |
0.00343 |
0.00846 |
| GO:0016748 |
succinyltransferase activity |
5 |
4 |
0.49 |
10 |
0.00430 |
0.00430 |
0.00430 |
| GO:0008138 |
protein tyrosine/serine/threonine phosph… |
45 |
5 |
4.44 |
11 |
0.00478 |
0.00478 |
0.00478 |
| GO:0140101 |
catalytic activity, acting on a tRNA |
139 |
14 |
13.72 |
318 |
0.39110 |
0.00603 |
0.94371 |
| GO:0004721 |
phosphoprotein phosphatase activity |
139 |
18 |
13.72 |
157 |
0.18331 |
0.00626 |
0.18331 |
| GO:0070840 |
dynein complex binding |
6 |
2 |
0.59 |
17 |
0.00967 |
0.00967 |
0.00967 |
| GO:0008093 |
cytoskeletal adaptor activity |
6 |
2 |
0.59 |
18 |
0.00967 |
0.00967 |
0.00967 |
| GO:0046527 |
glucosyltransferase activity |
17 |
5 |
1.68 |
57 |
0.04767 |
0.00992 |
0.04767 |
| GO:0035091 |
phosphatidylinositol binding |
109 |
22 |
10.76 |
15 |
0.00815 |
0.01428 |
0.00815 |
| GO:0043015 |
gamma-tubulin binding |
23 |
5 |
2.27 |
24 |
0.01438 |
0.01438 |
0.01438 |
| GO:0016301 |
kinase activity |
866 |
119 |
85.47 |
25 |
0.01678 |
0.01678 |
0.01678 |
| GO:0008483 |
transaminase activity |
32 |
5 |
3.16 |
21 |
0.01074 |
0.01721 |
0.01074 |
kable(
MFvar.all[MFvar.all$Significant < MFvar.all$Expected,],
caption = "Most under-represented terms of the Molecular Function ontology.")
Most under-represented terms of the Molecular Function ontology.
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(MFvar.weight01)[sigTerms]),
caption = paste('Molecular functions terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
Molecular functions terms significantly associated with hatching according to all 3 algorithms
| GO:0005515 |
protein binding |
Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). |
0.0001129 |
| GO:0060090 |
molecular adaptor activity |
The binding activity of a molecule that brings together two or more molecules through a selective, non-covalent, often stoichiometric interaction, permitting those molecules to function in a coordinated way. |
0.0001387 |
| GO:0016417 |
S-acyltransferase activity |
Catalysis of the transfer of an acyl group to a sulfur atom on the acceptor molecule. |
0.0007907 |
| GO:0009982 |
pseudouridine synthase activity |
Catalysis of the reaction: RNA uridine = RNA pseudouridine. Conversion of uridine in an RNA molecule to pseudouridine by rotation of the C1’-N-1 glycosidic bond of uridine in RNA to a C1’-C5. |
0.0011590 |
| GO:0005524 |
ATP binding |
Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. |
0.0026010 |
| GO:0003887 |
DNA-directed DNA polymerase activity |
Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide triphosphates in the presence of a DNA template and a 3’hydroxyl group. |
0.0031021 |
| GO:0004386 |
helicase activity |
Catalysis of the reaction: NTP + H2O = NDP + phosphate, to drive the unwinding of a DNA or RNA helix. |
0.0034318 |
| GO:0016748 |
succinyltransferase activity |
Catalysis of the transfer of a succinyl (3-carboxypropanoyl) group to an acceptor molecule. |
0.0043016 |
| GO:0008138 |
protein tyrosine/serine/threonine phosphatase activity |
Catalysis of the reactions: protein serine + H2O = protein serine + phosphate; protein threonine phosphate + H2O = protein threonine + phosphate; and protein tyrosine phosphate + H2O = protein tyrosine + phosphate. |
0.0047760 |
| GO:0070840 |
dynein complex binding |
Interacting selectively and non-covalently with a dynein complex, a protein complex that contains two or three dynein heavy chains and several light chains, and has microtubule motor activity. |
0.0096650 |
| GO:0008093 |
cytoskeletal adaptor activity |
The binding activity of a molecule that brings together a cytoskeletal protein and one or more other molecules, permitting them to function in a coordinated way. |
0.0096650 |
showSigOfNodes(GOdataVar.MF, score(MFvar.weight01),
firstSigNodes = sum(significant.elim),
wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 64
## Number of Edges = 76
##
## $complete.dag
## [1] "A graph with 64 nodes."
I plot variance portion, but for the term found most significant when using p-values, for comparison.
showGroupDensity(GOdataVar.MF, names(score(MF.weight01))[which.min(score(MF.weight01))])

Cellular component
orderedTerms <- names(sort(score(CCvar.weight01)))
significant.weight01 <- score(CCvar.weight01)[orderedTerms] <= 0.01
significant.lea <- score(CCvar.lea)[orderedTerms] <= 0.01
significant.elim <- score(CCvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
if (length(sigTerms) == 0) {
sigTerms <- orderedTerms[significant.lea & significant.elim]
}
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.variance.sigTerms <- sigTerms
CCvar.all <- GenTable(GOdataVar.CC, elim=CCvar.elim, weight01=CCvar.weight01, lea=CCvar.lea,
orderBy="weight01", ranksOf="elim", topNodes=max(sum(significant.elim), 10))
kable(
CCvar.all[CCvar.all$Significant > CCvar.all$Expected,],
caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
| GO:0036064 |
ciliary basal body |
14 |
5 |
1.33 |
3 |
0.00136 |
0.0014 |
0.00136 |
| GO:0031981 |
nuclear lumen |
280 |
38 |
26.68 |
2 |
0.00079 |
0.0028 |
0.00079 |
| GO:1990204 |
oxidoreductase complex |
28 |
10 |
2.67 |
1 |
0.00061 |
0.0029 |
0.00061 |
| GO:0045239 |
tricarboxylic acid cycle enzyme complex |
5 |
4 |
0.48 |
5 |
0.00415 |
0.0041 |
0.00415 |
| GO:0035658 |
Mon1-Ccz1 complex |
10 |
3 |
0.95 |
6 |
0.00487 |
0.0049 |
0.00487 |
| GO:0005815 |
microtubule organizing center |
96 |
16 |
9.15 |
22 |
0.01319 |
0.0081 |
0.13310 |
| GO:0005813 |
centrosome |
50 |
10 |
4.76 |
10 |
0.00810 |
0.0081 |
0.00810 |
| GO:0005634 |
nucleus |
1025 |
119 |
97.68 |
11 |
0.00820 |
0.0092 |
0.00021 |
| GO:0005720 |
nuclear heterochromatin |
5 |
1 |
0.48 |
17 |
0.01151 |
0.0115 |
0.01151 |
| GO:0008622 |
epsilon DNA polymerase complex |
11 |
5 |
1.05 |
21 |
0.01289 |
0.0129 |
0.01289 |
| GO:1902555 |
endoribonuclease complex |
10 |
1 |
0.95 |
27 |
0.01995 |
0.0161 |
0.01995 |
kable(
CCvar.all[CCvar.all$Significant < CCvar.all$Expected,],
caption = "Most under-represented terms of the Cellular Component ontology.")
Most under-represented terms of the Cellular Component ontology.
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
lea=significant.lea,
elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
Definition=Definition(GOTERM[sigTerms]),
PValue=score(CCvar.weight01)[sigTerms]),
caption = paste('Cellular component terms significantly associated with', VAR,
'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with hatching according to all 3 algorithms
| GO:0036064 |
ciliary basal body |
A membrane-tethered, short cylindrical array of microtubules and associated proteins found at the base of a eukaryotic cilium (also called flagellum) that is similar in structure to a centriole and derives from it. The cilium basal body is the site of assembly and remodelling of the cilium and serves as a nucleation site for axoneme growth. As well as anchoring the cilium, it is thought to provide a selective gateway regulating the entry of ciliary proteins and vesicles by intraflagellar transport. |
0.0013574 |
| GO:0031981 |
nuclear lumen |
The volume enclosed by the nuclear inner membrane. |
0.0027521 |
| GO:1990204 |
oxidoreductase complex |
Any protein complex that possesses oxidoreductase activity. |
0.0029066 |
| GO:0045239 |
tricarboxylic acid cycle enzyme complex |
Any of the heteromeric enzymes that act in the TCA cycle. |
0.0041466 |
| GO:0035658 |
Mon1-Ccz1 complex |
A protein complex that functions as a guanine nucleotide exchange factor (GEF) and converts Rab-GDP to Rab-GTP. In S. cerevisiae, this complex consists of at least Mon1 and Ccz1, and serves as a GEF for the Rab Ypt7p. |
0.0048672 |
| GO:0005813 |
centrosome |
A structure comprised of a core structure (in most organisms, a pair of centrioles) and peripheral material from which a microtubule-based structure, such as a spindle apparatus, is organized. Centrosomes occur close to the nucleus during interphase in many eukaryotic cells, though in animal cells it changes continually during the cell-division cycle. |
0.0081015 |
| GO:0005634 |
nucleus |
A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell’s chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent. |
0.0091673 |
showSigOfNodes(GOdataVar.CC, score(CCvar.weight01),
firstSigNodes = sum(significant.weight01),
wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 46
## Number of Edges = 77
##
## $complete.dag
## [1] "A graph with 46 nodes."
For comparison, I plot the distribution of variance portion for the CC term found most significant when using p-values.
showGroupDensity(GOdataVar.CC, names(score(CC.weight01))[which.min(score(CC.weight01))])

Comparison between the two ordering of genes.
Biological process
allTerms <- usedGO(GOdata.BP)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% BP.pvalue.sigTerms,
Variance = allTerms %in% BP.variance.sigTerms)))

ggplot(data = data.frame(PValue = rank(score(BP.weight01))[allTerms],
Variance = rank(score(BPvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth() + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of BP terms by significance')

Molecular function
allTerms <- usedGO(GOdata.MF)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% MF.pvalue.sigTerms,
Variance = allTerms %in% MF.variance.sigTerms)))

ggplot(data = data.frame(PValue = rank(score(MF.weight01))[allTerms],
Variance = rank(score(MFvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth() + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of MF terms by significance')

Cellular component
allTerms <- usedGO(GOdata.CC)
vennDiagram(vennCounts(cbind(PValue = allTerms %in% CC.pvalue.sigTerms,
Variance = allTerms %in% CC.variance.sigTerms)))

ggplot(data = data.frame(PValue = rank(score(CC.weight01))[allTerms],
Variance = rank(score(CCvar.weight01))[allTerms]),
mapping = aes(x = PValue, y = Variance)) +
geom_point() + geom_smooth() + xlab('Using gene p-values') +
ylab('Using portion of explained variance') +
ggtitle('Ordering of CC terms by significance')

Session info
I save the main variables to be able to load them in a new R session later.
save(allgenes2GO,
GOdata.BP, BP.elim, BP.weight01, BP.lea, BP.pvalue.sigTerms,
GOdata.MF, MF.elim, MF.weight01, MF.lea, MF.pvalue.sigTerms,
GOdata.CC, CC.elim, CC.weight01, CC.lea, CC.pvalue.sigTerms,
GOdataVar.BP, BPvar.elim, BPvar.weight01, BPvar.lea, BP.variance.sigTerms,
GOdataVar.MF, MFvar.elim, MFvar.weight01, MFvar.lea, MF.variance.sigTerms,
GOdataVar.CC, CCvar.elim, CCvar.weight01, CCvar.lea, CC.variance.sigTerms,
file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] Rgraphviz_2.30.0 ggplot2_3.2.1 limma_3.42.0
## [4] knitr_1.27 topGO_2.38.1 SparseM_1.78
## [7] GO.db_3.10.0 AnnotationDbi_1.48.0 IRanges_2.20.2
## [10] S4Vectors_0.24.3 Biobase_2.46.0 graph_1.64.0
## [13] BiocGenerics_0.32.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 xfun_0.12 purrr_0.3.3 splines_3.6.2
## [5] lattice_0.20-38 colorspace_1.4-1 vctrs_0.2.1 htmltools_0.4.0
## [9] yaml_2.2.0 mgcv_1.8-31 blob_1.2.1 rlang_0.4.2
## [13] pillar_1.4.3 glue_1.3.1 withr_2.1.2 DBI_1.1.0
## [17] bit64_0.9-7 matrixStats_0.55.0 lifecycle_0.1.0 stringr_1.4.0
## [21] munsell_0.5.0 gtable_0.3.0 memoise_1.1.0 evaluate_0.14
## [25] labeling_0.3 highr_0.8 Rcpp_1.0.3 backports_1.1.5
## [29] scales_1.1.0 farver_2.0.3 bit_1.1-15.1 digest_0.6.23
## [33] stringi_1.4.5 dplyr_0.8.3 tools_3.6.2 magrittr_1.5
## [37] lazyeval_0.2.2 tibble_2.1.3 RSQLite_2.2.0 crayon_1.3.4
## [41] pkgconfig_2.0.3 zeallot_0.1.0 Matrix_1.2-18 assertthat_0.2.1
## [45] rmarkdown_2.1 R6_2.4.1 nlme_3.1-143 compiler_3.6.2